approximately apply
Reviews: On Learning Over-parameterized Neural Networks: A Functional Approximation Perspective
One of the biggest reasons that I am not too thrilled about the submission is that the two-layer fully connected neural network model under consideration is off standard; the weights of the second layer is fixed to binary (i.e. 1, -1 with uniform scaling) and are not being updated via the gradient descent procedures. Correct me if I am wrong; this assumption is neither commonly adopted in experiments nor identical to the comparable theoretical works such as [DLL 18] or [ADH 19]. If the authors are convinced of the significance or general applicability of the suggested framework, they should have taken more care communicating those to the audience. A relatively minor issue is about the significance of the c_1 term in Theorem 3. The authors nicely demonstrate that the constant c_1 satisfying (13) can be controlled via a sum whose dominating term is inversely proportional to (lambda_{m_l} - lambda_{m_l 1}), which is later formalized to Theorem 4. (By the way, I had trouble locating a formal definition of eps(f *,l).) The question is, can we guarantee that the value is large enough to ensure the ignorability of the term c_1? I am particularly worried about this, as the authors have already mentioned that the spectrum of the random matrix concentrates as n grows, in line 145.